Carnegie  Mellon  University 

Software  Engineering  Institute 


Locality  Based  Analysis 
of  Network  Flows 

SEI/CERT 
21  July  2004 

John  McHugh, 

Carrie  Gates,  Damon  Becknel 

©  2004  by  Carnegie  Mellon 
University 


Carnegie  Mellon  University 

Software  Engineering  Institute 

Why  Locality 

•  Locality  is  an  entropy  based 
characterization  that  allows  prediction  of 
future  behavior  based  on  past  observations. 

-  It  captures  the  degree  to  which  the  behavior  of 
a  system  is  regular  in  some  sense 

-  It  appears  to  be  scale  free,  appearing  in  internet, 
subnet,  and  node  scale  behaviors. 

-  It  promotes  clustering  allowing  the  use  of  sets 
and  multisets  to  abstract  group  behaviors. 
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Eye  Candy  vs.  Insight 


•  Locality  often  manifests  as  patterns  in  some 
space. 

-  If  we  select  the  appropriate  dimensions,  we 
may  achieve  either  understanding  or 
puzzlement. 

-  The  next  three  pictures  show  persistent 
structure  where  none  might  be  expected. 

-  This  can  be  viewed  as  a  summary  of  a  time 
series  of  connection  matrices. 

-  Graphics  by  Carrie  Gates 
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see 


Number  of  Unique  Source  IPs  that  Contacted  X  Destination  IPs  Per  Hour 
(jis  routed,  TCP  only) 


Unique  Source  IPs 
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Then  it  goes  away  ... 


Number  of  Unique  Source  IPs  that  Contacted  X  Destination  IPs  Per  Hour 
(jis  routed,  TCP  only) 


Unique  Source  IPs 


Date  (August,  2003) 


Unique  Destination  IPs 
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(rather  abruptly) 


Number  of  Unique  Source  IPs  that  Contacted  X  Destinations  Per  Hour 
(TCP  Incoming,  SYN  and  SYN-RST  flows  only) 


250 

Unique  Destination  IPs 


Aug  5-8  + 

Aug  12-15  ■ 
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Only  to  return  (months  later). 


Number  of  Unique  Source  IPs  that  Contacted  X  Destination  IPs  Per  Hour 
(TCP  incoming,  routed  and  web) 
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Williamson’s  Locality 

•  Matt  Williamson,  late  of  HP  Bristol,  noted 
address  locality  in  a  2002  ACSAC  paper. 

-  For  browsing,  last  10  IPs  visited  constitute  an 
effective  working  set. 

-  Working  set  violations  relatively  rare,  bursts 
rarer  yet. 

•  Delay  on  violation  is  effective  “soft”  mitigator 

•  What  is  the  locality  of  trans  border  data? 
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Detail  of  Inside  to  Outside  Day 

Number  of  Destination  IPs  Contacted  Per  Source  Over  Time 
(14  January  2003,  all  outgoing  TCP  traffic,  calculated  on  a  per  hour  basis) 


LOG:  Percent  of  All  Source  IPs 
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Weekly  In/Out  Locality  Range 
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Number  of  Destination  IPs  Contacted  Per  Source  Over  Time 
(11-17  January  2003,  all  outgoing  TCP  traffic,  calculated  on  a  per  hour  basis) 
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Williamson  Confirmed  (mostly) 

•  With  the  caveat  that  we  are  not  seeing 
internal  connections,  the  vast  majority  of 
the  flows  arguably  follow  Williamson’s 
working  set  model. 

•  As  usual,  there  are  outliers  ... 
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One  Day  of  Inside  to  Outside 

Number  of  Destination  IPs  Contacted  Per  Source  Over  Time 
(14  January  2003,  all  outgoing  TCP  traffic,  calculated  on  a  per  hour  basis) 


LOG:  Percent  of  All  Source  IPs 
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Noise  localities 

•  We  have  been  characterizing  modest 
subnets  in  support  of  the  traffic  generation 
that  will  be  used  in  the  DARPA  DQ  system 
evaluations. 

-  Attempting  to  avoid  mistakes  of  DARPA  IDS 
evaluation. 

-  Striving  for  a  realistic  noise  environment, 
among  other  things. 
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Crud  and  Noise 


•  In  January,  we  observed  a  /1 6  for  a  week,  and  the  whole 
customer  net  for  a  minute 
•For  the  /16 


MMM.NNN.24.X  -  66  hosts 
MMM.NNN.26.X  -  46  hosts 
MMM.NNN.28.X  -  57  hosts 
MMM.NNN.30.X  -  70  hosts 
MMM.NNN.32.X  -  54  hosts 
MMM.NNN.34.X  -  50  hosts 
MMM.NNN.  120.x  -  2  hosts 
MMM.NNN.  140.x  -  1  host 
Total  600  hosts  in  16  /24s 


MMM.NNN.25.X  -  60  hosts 
MMM.NNN.27.X  -  49  hosts 
MMM.NNN.29.X  -  7  hosts 
MMM.NNN. 3 1  .x  -  67  hosts 
MMM.NNN. 3 3.x  -  62  hosts 
MMM.NNN.35.X  -  4  hosts 
MMM.NNN.  127.x  -  1  host 
MMM.NNN.251.X  -  4  hosts 
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One  week  on  the  /16 


Date  and  Time 


- Inbound  Hits 

- Inbound  misses 
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1  Min  sample  -  destinations 

IP  Destination  Analysis 
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1  Min  Sample  -  sources 


IP  Source  Analysis 
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top  5  in  1  min  sample 

•  Created  a  “bag”  for  source  and  destination  addresses  in  the 
1  minute  sample.  The  annotated  top  5  are: 

•  (39)  lip  $  readbag  -count  -print  jcm-tcp-s-10+.bagl  sort  -r 
-n  I  head 

12994  AAA.BBB.068.218  -  scan  4899  (Radmin) 

6598  CCC.DDD.209.215  -  scan  7100  (X-Font) 

5944  EEE.FFF.125.1 17  -  scan  20168  (Lovegate) 

5465  GGG.HHH.  1 14.052  -  ditto 

5303  IIIJJJ.164.126  -  scan  3127  (My  doom) 
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Bottom  of  bag  in  1  min  sample 

•  3335  external  hosts  sent  exactly  one  TCP  flow 

-  SYN  probes  for  port  8866  449  times 

•  W32.Beagle.B@mm  is  a  mass-mailing  worm-back 
door  on  TCP  port  8866. 

-  SYN  probes  for  port  25  are  seen  271  times. 

-  Most  remainder  are  SYNs  to  a  variety  of  ports,  mostly 
with  high  port  numbers. 

-  There  are  a  number  of  ACK/RST  packets  which  are 
probably  associated  with  responses  to  spoofed  DDoS 
attacks. 
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Individual  host  profiles 

•  These  were  done  by  Capt.  Damon  Becknel, 
USA. 

-  He  was  looking  for  ways  of  characterizing  the 
role  of  a  node  based  on  it’s  activity  patterns 

-  As  usual,  surprising  results  are  sometimes 
observed. 
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Workstation? 

Workstation?  -  Distribution  of  dport 


Number  of  Flows 
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Scanner 

Scanner  -  Distribution  of  dport 


University 


Carnegie  Mellon  University 

Software  Engineering  Institute 


Mail  Server? 


Mail  Server  -  Distribution  of  sport 
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Web  Server 

Web  Server  -  Distribution  of  sport 
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Web  Server 


Web  Server  -  Distribution  of  dport 
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Summary 

•  We  have  provided  some  examples  of 
locality  on  a  variety  of  scales  for  a  variety 
of  representations. 

•  It  is  our  hope  that  the  general  notions  of 
locality,  and  clustering  will  provide  a  basis 
for  reducing  the  complexity  of  analysis. 
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